FDR2-BD: A Fast Data Reduction Recommendation Tool for Tabular Big Data Classification Problems

نویسندگان

چکیده

In this paper, a methodological data condensation approach for reducing tabular big datasets in classification problems is presented, named FDR2-BD. The key of our proposal to analyze dual way (vertical and horizontal), so as provide smart combination between feature selection generate dense clusters uniform sampling reduction keep only few representative samples from each problem area. Its main advantage allowing the model’s predictive quality be kept range determined by user’s threshold. robustness built on hyper-parametrization process, which all are taken into consideration following k-fold procedure. Another significant capability being fast scalable using fully optimized parallel operations provided Apache Spark. An extensive experimental study performed over 25 with different characteristics. most cases, obtained percentages above 95%, thus outperforming state-of-the-art solutions such FCNN_MR that barely reach 70%. promising outcome maintaining representativeness original information, prediction values around 1% baseline.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Distributed Recommendation Platform for Big Data

The vast amount of information that recommenders manage these days has reached a point where scalability has become a critical factor. In this work, we propose a scalable architecture designed for computing Collaborative Filtering recommendations in a Big Data scenario. In order to build a highly scalable and fault-tolerant platform, we employ fully distributed systems without any single point ...

متن کامل

A Fuzzy TOPSIS Approach for Big Data Analytics Platform Selection

Big data sizes are constantly increasing. Big data analytics is where advanced analytic techniques are applied on big data sets. Analytics based on large data samples reveals and leverages business change. The popularity of big data analytics platforms, which are often available as open-source, has not remained unnoticed by big companies. Google uses MapReduce for PageRank and inverted indexes....

متن کامل

Correct classification for big/smart/fast data machine learning

Table (database) / Relational database Classification for big/smart/fast data machine learning is one of the most important tasks of predictive analytics and extracting valuable information from data. It is core applied technique for what now understood under data science and/or artificial intelligence. Widely used Decision Tree (Random Forest) and rare used rule based PRISM , VFST, etc classif...

متن کامل

Fuzzy Data Envelopment Analysis for Classification of Streaming Data

The classification of fuzzy uncertain data is considered one of the most challenging issues in data analysis. In spite of the significance of fuzzy data in mathematical programming, the development of the analytical methods of fuzzy data is slow. Therefore, the current study proposes a new fuzzy data classification method based on fuzzy data envelopment analysis (DEA) which can handle strea...

متن کامل

Fuzzy Data Envelopment Analysis for Classification of Streaming Data

The classification of fuzzy uncertain data is considered one of the most challenging issues in data analysis. In spite of the significance of fuzzy data in mathematical programming, the development of the analytical methods of fuzzy data is slow. Therefore, the current study proposes a new fuzzy data classification method based on fuzzy data envelopment analysis (DEA) which can handle strea...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Electronics

سال: 2021

ISSN: ['2079-9292']

DOI: https://doi.org/10.3390/electronics10151757